Quantitative Seminar - 01/16/2025
Traditional educational research often fixates on average academic achievement.
Average performance and variability convey distinct information.
We adapt Mixed-Effects Location Scale Model (MELSM) incorporating a spike and slab prior into the scale component to select or shrink random effects.
Based on Bayes factors, we can decide on whether a school is (in-)consistent in its academic achievement.
Evidence for retaining the random effect is evidence of unusual variability.
Assumes a fixed within-school variance, potentially masking important differences in variability:
MELSM allows for the simultaneous estimation of a model for the means (location) and a model for the residual variance (scale).
Both sub-models are conceptualized as mixed-effect models.
\[\begin{equation} \textbf{v}_j= \begin{bmatrix} u_{0j} \\ t_{0j} \end{bmatrix} \sim \mathcal{N} \begin{pmatrix} \boldsymbol{0}= \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \boldsymbol{\Sigma}= \begin{bmatrix} \tau^2_{u_{0j}} & \tau_{u_{0j}t_{0j}} \\ \tau_{u_{0j}t_{0j}} & \tau^2_{t_{0j}} \end{bmatrix} \end{pmatrix} \end{equation}\]
Accounts for possible correlations among location and scale effects.
Allows the inclusion of specific predictors in both sub-models.
We incorporate the spike-and-slab prior as a method of variable selection of random effects in the scale model.
The model is allowed to switch between two assumptions:
\[\begin{equation} \color{lightgray}{ \textbf{v}= \begin{bmatrix} u_0 \\ t_0 \end{bmatrix} \sim \mathcal{N}} \begin{pmatrix} \color{lightgray}{ \boldsymbol{0}= \begin{bmatrix} 0 \\ 0 \end{bmatrix},} \boldsymbol{\Sigma}= \begin{bmatrix} \tau^2_{u_0} & \tau_{u_0t_0} \\ \tau_{u_0t_0} & \tau^2_{t_0} \end{bmatrix} \end{pmatrix} \end{equation}\]
\[\begin{equation} \label{eq:cholesky_approach} \textbf{L} = \begin{pmatrix} 1 & 0 \\ \rho_{u_0t_0} & \sqrt{1 - \rho_{u_0t_0}^2} \end{pmatrix} \end{equation}\]
If we multiply \(\textbf{L}\) by the random effect standard deviations, \(\boldsymbol{\tau}\), and scale it with a standard normally distributed \(\boldsymbol{z}\), we obtain \(\textbf{v}\):
\[\begin{equation} \textbf{v} = \boldsymbol{\tau}\textbf{L}\boldsymbol{z} \end{equation}\]
The Cholesky decomposition allows expressing the random effects in terms of the standard deviations and correlations.
\[\begin{equation} \begin{aligned} u_{0j} &= \tau_{u_0}z_{ju_0}\\ t_{0j} &= \tau_{t_0}\left( \rho_{u_0t_0}z_{ju_0} + z_{jt_0}\sqrt{1 - \rho_{u_0t_0}^2} \right)\color{red}{\delta_{jt_0}} \end{aligned} \end{equation}\]
\[\begin{equation} t_{0j} = \tau_{t_0}\left( \rho_{u_0t_0}z_{ju_0} + z_{jt_0}\sqrt{1 - \rho_{u_0t_0}^2} \right)\color{red}{\delta_{jt_0}} \end{equation}\]
Each element in \(\boldsymbol{\delta}_j\) takes integers \(\in \{0,1\}\) and follows a \(\delta_{jk} \sim \text{Bernoulli}(\pi)\) distribution.
When a 0 is sampled, the portion after the fixed effect drops out of the equation.
\[\begin{equation} \label{eq:mm_delta} \sigma_{\varepsilon_{ij}} = \begin{cases} \exp(\eta_0 + 0), & \text{if }\delta_{jt_0} = 0 , \\ \exp(\eta_0 + t_{0j}), & \text{if }\delta_{jt_0} = 1 \end{cases} \end{equation}\]
Throughout the MCMC sampling process \(\delta\) switches between the spike and slab.
If \(\delta= 0\), the density “spikes” at the zero point mass;
If \(\delta= 1\), the standard normal prior, \(z_{jk}\), is retained and scaled by \(\tau_k\), introducing the “slab”.
\[\begin{align} \label{eq:pip_theorical} Pr(\delta_{jk} = 1 | \textbf{Y}) = \frac{Pr(\textbf{Y} | \delta_{jk} = 1)Pr(\delta_{jk} = 1)}{Pr(\textbf{Y})} \end{align}\]
The PIP is estimated by the proportion of MCMC samples where \(\delta_{jk} = 1\):
\[\begin{align} \label{eq:pip} Pr(\delta_{jk} = 1 | \textbf{Y}) = \frac{1}{S} \sum_{s = 1}^S \delta_{jks} \end{align}\]
where \(S\) is the total number of posterior samples.
If there is evidence for zero variance in the scale random effects, the model reduces to the MLM assumption:
\[\varepsilon_{ij}\sim\mathcal{N}(0, \sigma_\varepsilon)\]
If not, the MELSM assumption of variance heterogeneity is retained:
\[\varepsilon_{ij}\sim\mathcal{N}(0, \sigma_{\varepsilon_{ij}})\]
The PIP gives us a probabilistic measure and does not perform automatic variable selection. We estimate the strength of evidence through Bayes factors:
\[\begin{align} \label{eq:bf_pip} BF_{10j} = \frac{Pr(\delta_{jk} = 1 | \textbf{Y}) }{1 - Pr(\delta_{jk} = 1 | \textbf{Y}) } \end{align}\]
A BF\(_{10}\) > 3 corresponds to a PIP > 0.75 when the prior probability of \(\pi\) is 0.5.
We are at least three times more likely to include this random effect.
We use a subset of data from the 2021 Brazilian Evaluation System of Elementary Education (Saeb) test.
It focuses on math scores from 11th and 12th-grade students across 160 randomly selected schools, encompassing a total of 11,386 students.
The analysis compares three SS-MELSM models with varying levels of complexity:
The model was fitted using ivd package in R (Rast & Carmo, 2024).
All models were fitted with six chains of 3,000 iterations and 12,000 warm-up samples.
We computed the estimation efficiency using \(\hat{R}\) and the effective sample size (ESS).
The models were compared for predictive accuracy using PSIS-LOO cross-validation.
Model 1 identified eight schools with PIPs exceeding 0.75, suggesting notable deviations from the average within-school variance.
By incorporating SES covariates, Model 2 significantly outperformed Model 1 in terms of predictive accuracy, \(\Delta\widehat{\text{elpd}}_{\text{loo}}= -43.6 (10.5)\) .
Model 3 was practically indistinguishable from Model 2; the inclusion of a random slope for the student-level SES did not improve the model’s predictive accuracy, \(\Delta\widehat{\text{elpd}}_{\text{loo}}= -1.3 (0.6)\).
The SS-MELSM helps identifying schools deviating from the average in terms of within-school variability.
PIPs and Bayes factors provide a probabilistic measure for random effects inclusion.
Identifying variability can guide resource allocation or teaching interventions.
Estimating the SS-MELSM can be computationally intensive for large-scale applications.
It is still not clear how model performance is affected by the choice of hyperparameters.
Ceiling and floor effects might compress within-school variability, affecting residual standard deviation estimation and interpretation.
The necessary sample sizes and observations per cluster remains uncertain.
Thank you to my co-authors, Prof. Philippe Rast and Dr. Donald Williams.
Read the preprint at
Beyond Averages with MELSM and Spike-and-Slab